
******************************************************************************
************* CREATE EV SAMPLE ***************************
******************************************************************************

use "$root/Data/Original/EV_alldata_new.dta", clear
rename stammnummer VID
merge m:1 VID using "$root\Data/Original/VID_PID_adress_new.dta"
keep if _merge==3
drop _ 
duplicates tag PID year, g(duplicates)
tab duplicates
drop if duplicates>0
save "$root/Data/Original/EV_alldata_PID_new.dta", replace


**Add the mileage variables: 
merge 1:1 VID using "$root/Data/Original/KM_driven.dta"

*Also keep observations w/o KM_driven, since we otherwise lose half of our "population"
drop if _merge==1

drop _merge

*Generate a variable which measures years between inspection and first registration
gen year_use=inspect_year - year
*some obs where around the change of the year and thus wrongly as 0
replace year_use=0 if year_use==-1

tab year_use

gen average_use=0
replace average_use=KM_driven/year_use if year_use>0

*If the check_up and the 1st date of registration is in the same year, we set the data as missing, since there are also imported used cars, which would skew the statistics if we calculated the reported KM as average use
*in one or less than one year. 
replace average_use=. if year_use==0


duplicates tag PID, g(dup)
gsort -dup PID -datum_1iv
drop dup
duplicates drop PID, force

*Save the new dataset
save "$root/Data/Produced/EV_alldata_PID_new_withKM.dta", replace

